{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>\n",
    "\n",
    "<i>Licensed under the MIT License.</i>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train SAR on MovieLens with Azure Machine Learning (Python, CPU)\n",
    "---\n",
    "## Introduction to Azure Machine Learning  \n",
    "The **[Azure Machine Learning service (AzureML)](https://docs.microsoft.com/azure/machine-learning/service/overview-what-is-azure-ml)** provides a cloud-based environment you can use to prep data, train, test, deploy, manage, and track machine learning models. By using Azure Machine Learning service, you can start training on your local machine and then scale out to the cloud. With many available compute targets, like [Azure Machine Learning Compute](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) and [Azure Databricks](https://docs.microsoft.com/en-us/azure/azure-databricks/what-is-azure-databricks), and with [advanced hyperparameter tuning services](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters), you can build better models faster by using the power of the cloud.\n",
    "\n",
    "Data scientists and AI developers use the main [Azure Machine Learning Python SDK](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py) to build and run machine learning workflows with the Azure Machine Learning service. You can interact with the service in any Python environment, including Jupyter Notebooks or your favorite Python IDE. The Azure Machine Learning SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.\n",
    "![AzureML Workflow](https://docs.microsoft.com/en-us/azure/machine-learning/service/media/overview-what-is-azure-ml/aml.png)\n",
    "\n",
    "This notebook provides an example of how to utilize and evaluate the Simple Algorithm for Recommendation (SAR) algorithm using the Azure Machine Learning service. It takes the content of the [SAR quickstart notebook](sar_movielens.ipynb) and demonstrates how to use the power of the cloud to manage data, switch to powerful GPU machines, and monitor runs while training a model. \n",
    "\n",
    "See the hyperparameter tuning notebook for more advanced use cases with AzureML.\n",
    "\n",
    "### Advantages of using AzureML:\n",
    "- Manage cloud resources for monitoring, logging, and organizing your machine learning experiments.\n",
    "- Train models either locally or by using cloud resources, including GPU-accelerated model training.\n",
    "- Easy to scale out when dataset grows - by just creating and pointing to new compute target\n",
    "\n",
    "---\n",
    "## Details of SAR\n",
    "<details>\n",
    "    <summary>Click to expand</summary>\n",
    "    \n",
    "SAR is a fast scalable adaptive algorithm for personalized recommendations based on user transaction history. It produces easily explainable / interpretable recommendations and handles \"cold item\" and \"semi-cold user\" scenarios. SAR is a kind of neighborhood based algorithm (as discussed in [Recommender Systems by Aggarwal](https://dl.acm.org/citation.cfm?id=2931100)) which is intended for ranking top items for each user. \n",
    "\n",
    "SAR recommends items that are most ***similar*** to the ones that the user already has an existing ***affinity*** for. Two items are ***similar*** if the users who have interacted with one item are also likely to have interacted with another. A user has an ***affinity*** to an item if they have interacted with it in the past.\n",
    "\n",
    "### Advantages of SAR:\n",
    "- High accuracy for an easy to train and deploy algorithm\n",
    "- Fast training, only requiring simple counting to construct matrices used at prediction time\n",
    "- Fast scoring, only involving multiplication of the similarity matric with an affinity vector\n",
    "\n",
    "### Notes to use SAR properly:\n",
    "- SAR does not use item or user features, so cannot handle cold-start use cases\n",
    "- SAR requires the creation of an $mxm$ dense matrix (where $m$ is the number of items). So memory consumption can be an issue with large numbers of items.\n",
    "- SAR is best used for ranking items per user, as the scale of predicted ratings may be different from the input range and will differ across users.\n",
    "For more details see the deep dive notebook on SAR here: [SAR Deep Dive Notebook](../02_model_collaborative_filtering/sar_deep_dive.ipynb)</details>\n",
    "---\n",
    "## Prerequisities\n",
    "   - **Azure Subscription**\n",
    "     - If you don’t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning service today](https://azure.microsoft.com/en-us/free/services/machine-learning/).\n",
    "     - You get credits to spend on Azure services, which will easily cover the cost of running this example notebook. After they're used up, you can keep the account and use [free Azure services](https://azure.microsoft.com/en-us/free/). Your credit card is never charged unless you explicitly change your settings and ask to be charged. Or [activate MSDN subscriber benefits](https://azure.microsoft.com/en-us/pricing/member-offers/credit-for-visual-studio-subscribers/), which give you credits every month that you can use for paid Azure services.\n",
    "---   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "azureml.core version: 1.0.18\n"
     ]
    }
   ],
   "source": [
    "# set the environment path to find Recommenders\n",
    "import sys\n",
    "sys.path.append(\"../../\")\n",
    "\n",
    "import os\n",
    "import shutil\n",
    "import numpy as np\n",
    "from tempfile import TemporaryDirectory\n",
    "\n",
    "import azureml\n",
    "from azureml.core import Workspace, Run, Experiment\n",
    "from azureml.core.compute import ComputeTarget, AmlCompute\n",
    "from azureml.train.estimator import Estimator\n",
    "from azureml.widgets import RunDetails\n",
    "\n",
    "from reco_utils.azureml.azureml_utils import get_or_create_workspace\n",
    "from reco_utils.dataset import movielens\n",
    "\n",
    "print(\"azureml.core version: {}\".format(azureml.core.VERSION))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "# top k items to recommend\n",
    "TOP_K = 10\n",
    "\n",
    "# Select Movielens data size: 100k, 1m, 10m, or 20m\n",
    "MOVIELENS_DATA_SIZE = '100k'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Connect to an AzureML workspace\n",
    "\n",
    "An [AzureML Workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inferencing, and the monitoring of deployed models.\n",
    "\n",
    "The function below will get or create an AzureML Workspace and save the configuration to `aml_config/config.json`.\n",
    "\n",
    "It defaults to use provided input parameters or environment variables for the Workspace configuration values. Otherwise, it will use an existing configuration file (either at `./aml_config/config.json` or a path specified by the config_path parameter).\n",
    "\n",
    "Lastly, if the workspace does not exist, one will be created for you. See [this tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace#portal) to locate information such as subscription id."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ws = get_or_create_workspace(\n",
    "    subscription_id=\"<SUBSCRIPTION_ID>\",\n",
    "    resource_group=\"<RESOURCE_GROUP>\",\n",
    "    workspace_name=\"<WORKSPACE_NAME>\",\n",
    "    workspace_region=\"<WORKSPACE_REGION>\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a Temporary Directory\n",
    "This directory will house the data and scripts needed by the AzureML Workspace"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "tmp_dir = TemporaryDirectory()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download dataset and upload to datastore\n",
    "\n",
    "Every workspace comes with a default [datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and access it from the compute target.\n",
    "\n",
    "The data files are uploaded into a directory named `data` at the root of the datastore."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 4.81k/4.81k [00:02<00:00, 1.98kKB/s]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "$AZUREML_DATAREFERENCE_57dbc7117f67479892135cec2819b78b"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "TARGET_DIR = 'movielens'\n",
    "\n",
    "# download dataset\n",
    "data = movielens.load_pandas_df(\n",
    "    size=MOVIELENS_DATA_SIZE,\n",
    "    header=['UserId','MovieId','Rating','Timestamp']\n",
    ")\n",
    "\n",
    "# upload dataset to workspace datastore\n",
    "data_file_name = \"movielens_\" + MOVIELENS_DATA_SIZE + \"_data.pkl\"\n",
    "data.to_pickle(os.path.join(tmp_dir.name, data_file_name))\n",
    "\n",
    "ds = ws.get_default_datastore()\n",
    "ds.upload(src_dir=tmp_dir.name, target_path=TARGET_DIR, overwrite=True, show_progress=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create or Attach Azure Machine Learning Compute \n",
    "\n",
    "We create a cpu cluster as our **remote compute target**. If a cluster with the same name already exists in your workspace, the script will load it instead. You can read [Set up compute targets for model training](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets) to learn more about setting up compute target on different locations. You can also create GPU machines when larger machines are necessary to train the model.\n",
    "\n",
    "According to Azure [Pricing calculator](https://azure.microsoft.com/en-us/pricing/calculator/), with example VM size `STANDARD_D2_V2`, it costs a few dollars to run this notebook, which is well covered by Azure new subscription credit. For billing and pricing questions, please contact [Azure support](https://azure.microsoft.com/en-us/support/options/).\n",
    "\n",
    "**Note**:\n",
    "- 10m and 20m dataset requires more capacity than `STANDARD_D2_V2`, such as `STANDARD_NC6` or `STANDARD_NC12`. See list of all available VM sizes [here](https://docs.microsoft.com/en-us/azure/templates/Microsoft.Compute/2018-10-01/virtualMachines?toc=%2Fen-us%2Fazure%2Fazure-resource-manager%2Ftoc.json&bc=%2Fen-us%2Fazure%2Fbread%2Ftoc.json#hardwareprofile-object).\n",
    "- As with other Azure services, there are limits on certain resources (e.g. AzureML Compute quota) associated with the Azure Machine Learning service. Please read [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request) on the default limits and how to request more quota.\n",
    "---\n",
    "#### Learn more about Azure Machine Learning Compute\n",
    "<details>\n",
    "    <summary>Click to learn more about compute types</summary>\n",
    "    \n",
    "[Azure Machine Learning Compute](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created within your workspace region and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user.\n",
    "\n",
    "Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service.\n",
    "\n",
    "You can provision a persistent AzureML Compute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n",
    "\n",
    "In addition to vm_size and max_nodes, you can specify:\n",
    "- **min_nodes**: Minimum nodes (default 0 nodes) to downscale to while running a job on AzureML Compute\n",
    "- **vm_priority**: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AzureML Compute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted\n",
    "- **idle_seconds_before_scaledown**: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes\n",
    "- **vnet_resourcegroup_name**: Resource group of the existing VNet within which Azure MLCompute should be provisioned\n",
    "- **vnet_name**: Name of VNet\n",
    "- **subnet_name**: Name of SubNet within the VNet\n",
    "</details>\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Creating a new compute target...\n",
      "Creating\n",
      "Succeeded\n",
      "AmlCompute wait for completion finished\n",
      "Minimum number of nodes requested have been provisioned\n"
     ]
    }
   ],
   "source": [
    "# Remote compute (cluster) configuration. If you want to save the cost more, set these to small.\n",
    "VM_SIZE = 'STANDARD_D2_V2'\n",
    "# Cluster nodes\n",
    "MIN_NODES = 0\n",
    "MAX_NODES = 2\n",
    "\n",
    "CLUSTER_NAME = 'cpucluster'\n",
    "\n",
    "try:\n",
    "    compute_target = ComputeTarget(workspace=ws, name=CLUSTER_NAME)\n",
    "    print(\"Found existing compute target\")\n",
    "except:\n",
    "    print(\"Creating a new compute target...\")\n",
    "    # Specify the configuration for the new cluster\n",
    "    compute_config = AmlCompute.provisioning_configuration(\n",
    "        vm_size=VM_SIZE,\n",
    "        min_nodes=MIN_NODES,\n",
    "        max_nodes=MAX_NODES\n",
    "    )\n",
    "    # Create the cluster with the specified name and configuration\n",
    "    compute_target = ComputeTarget.create(ws, CLUSTER_NAME, compute_config)\n",
    "    # Wait for the cluster to complete, show the output log\n",
    "    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Prepare training script\n",
    "### 1. Create a directory\n",
    "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "SCRIPT_DIR = os.path.join(tmp_dir.name, 'movielens-sar')\n",
    "os.makedirs(SCRIPT_DIR, exist_ok=True)\n",
    "TRAIN_FILE = os.path.join(SCRIPT_DIR, 'train.py')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Create a training script\n",
    "To submit the job to the cluster, first create a training script. Run the following code to create the training script called `train.py` in temporary directory. This training adds a regularization rate to the training algorithm, so produces a slightly different model than the local version.\n",
    "\n",
    "This code takes what is in the local quickstart and convert it to one single training script. We use run.log() to record parameters to the run. We will be able to review and compare these measures in the Azure Portal at a later time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing /tmp/tmp6imrlt0z/movielens-sar/train.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile $TRAIN_FILE\n",
    "\n",
    "import argparse\n",
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import itertools\n",
    "import logging\n",
    "\n",
    "from azureml.core import Run\n",
    "from sklearn.externals import joblib\n",
    "\n",
    "from reco_utils.common.timer import Timer\n",
    "from reco_utils.dataset import movielens\n",
    "from reco_utils.dataset.python_splitters import python_stratified_split\n",
    "from reco_utils.evaluation.python_evaluation import map_at_k, ndcg_at_k, precision_at_k, recall_at_k\n",
    "from reco_utils.recommender.sar.sar_singlenode import SARSingleNode\n",
    "\n",
    "\n",
    "logging.basicConfig(level=logging.DEBUG, \n",
    "                    format='%(asctime)s %(levelname)-8s %(message)s')\n",
    "\n",
    "\n",
    "TARGET_DIR = 'movielens'\n",
    "OUTPUT_FILE_NAME = 'outputs/movielens_sar_model.pkl'\n",
    "MODEL_FILE_NAME = 'movielens_sar_model.pkl'\n",
    "\n",
    "\n",
    "# get hold of the current run\n",
    "run = Run.get_context()\n",
    "\n",
    "# let user feed in 2 parameters, the location of the data files (from datastore), and the regularization rate of the logistic regression model\n",
    "parser = argparse.ArgumentParser()\n",
    "parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')\n",
    "parser.add_argument('--data-file', type=str, dest='data_file', help='data file name')\n",
    "parser.add_argument('--top-k', type=int, dest='top_k', default=10, help='top k items to recommend')\n",
    "parser.add_argument('--data-size', type=str, dest='data_size', default=10, help='Movielens data size: 100k, 1m, 10m, or 20m')\n",
    "args = parser.parse_args()\n",
    "\n",
    "# set col names\n",
    "header = {\n",
    "    \"col_user\": \"UserId\",\n",
    "    \"col_item\": \"MovieId\",\n",
    "    \"col_rating\": \"Rating\",\n",
    "    \"col_timestamp\": \"Timestamp\",\n",
    "}\n",
    "\n",
    "# read data\n",
    "data_pickle_path = os.path.join(args.data_folder, args.data_file)\n",
    "data = pd.read_pickle(path=data_pickle_path)\n",
    "\n",
    "# Log arguments to the run for tracking\n",
    "run.log(\"top-k\", args.top_k)\n",
    "run.log(\"data-size\", args.data_size)\n",
    "\n",
    "# split dataset into train and test\n",
    "train, test = python_stratified_split(data, ratio=0.75, col_user=header[\"col_user\"], col_item=header[\"col_item\"], seed=42)\n",
    "\n",
    "# instantiate the model\n",
    "model = SARSingleNode(\n",
    "    similarity_type=\"jaccard\", \n",
    "    time_decay_coefficient=30, \n",
    "    time_now=None, \n",
    "    timedecay_formula=True, \n",
    "    **header\n",
    ")\n",
    "\n",
    "# train the SAR model\n",
    "with Timer() as t:\n",
    "    model.fit(train)\n",
    "\n",
    "run.log(name=\"Training time\", value=t.interval)\n",
    "\n",
    "# predict top k items\n",
    "with Timer() as t:\n",
    "    top_k = model.recommend_k_items(test, remove_seen=True)\n",
    "\n",
    "run.log(name=\"Prediction time\", value=t.interval)\n",
    "\n",
    "# compute evaluation metrics\n",
    "eval_map = map_at_k(test, top_k, col_user=\"UserId\", col_item=\"MovieId\", \n",
    "                    col_rating=\"Rating\", col_prediction=\"prediction\", \n",
    "                    relevancy_method=\"top_k\", k=args.top_k)\n",
    "eval_ndcg = ndcg_at_k(test, top_k, col_user=\"UserId\", col_item=\"MovieId\", \n",
    "                      col_rating=\"Rating\", col_prediction=\"prediction\", \n",
    "                      relevancy_method=\"top_k\", k=args.top_k)\n",
    "eval_precision = precision_at_k(test, top_k, col_user=\"UserId\", col_item=\"MovieId\", \n",
    "                                col_rating=\"Rating\", col_prediction=\"prediction\", \n",
    "                                relevancy_method=\"top_k\", k=args.top_k)\n",
    "eval_recall = recall_at_k(test, top_k, col_user=\"UserId\", col_item=\"MovieId\", \n",
    "                          col_rating=\"Rating\", col_prediction=\"prediction\", \n",
    "                          relevancy_method=\"top_k\", k=args.top_k)\n",
    "\n",
    "run.log(\"map\", eval_map)\n",
    "run.log(\"ndcg\", eval_ndcg)\n",
    "run.log(\"precision\", eval_precision)\n",
    "run.log(\"recall\", eval_recall)\n",
    "\n",
    "# automatic upload of everything in ./output folder doesn't work for very large model file\n",
    "# model file has to be saved to a temp location, then uploaded by upload_file function\n",
    "joblib.dump(value=model, filename=MODEL_FILE_NAME)\n",
    "\n",
    "run.upload_file(OUTPUT_FILE_NAME, MODEL_FILE_NAME)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'/tmp/tmp6imrlt0z/movielens-sar/reco_utils'"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# copy dependent python files\n",
    "UTILS_DIR = os.path.join(SCRIPT_DIR, 'reco_utils')\n",
    "if os.path.exists(UTILS_DIR):\n",
    "    shutil.rmtree(UTILS_DIR)\n",
    "shutil.copytree('../../reco_utils/', UTILS_DIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Run training script\n",
    "### 1. Create an estimator\n",
    "An [estimator](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-ml-models) object is used to submit the run. You can create and use a generic Estimator to submit a training script using any learning framework you choose (such as scikit-learn) you want to run on any compute target, whether it's your local machine, a single VM in Azure, or a GPU cluster in Azure. \n",
    "\n",
    "Create your estimator by running the following code to define:  \n",
    "* The name of the estimator object, `est`\n",
    "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n",
    "* The compute target.  In this case you will use the AzureML Compute you created\n",
    "* The training script name, train.py\n",
    "* Parameters required from the training script \n",
    "* Python packages needed for training\n",
    "* Connect to the data files in the datastore\n",
    "\n",
    "In this tutorial, this target is AzureML Compute. All files in the script folder are uploaded into the cluster nodes for execution. `ds.as_mount()` mounts a datastore on the remote compute and returns the folder. See documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data#access-datastores-during-training)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "tags": [
     "configure estimator"
    ]
   },
   "outputs": [],
   "source": [
    "script_params = {\n",
    "    '--data-folder': ds.as_mount(),\n",
    "    '--data-file': 'movielens/' + data_file_name,\n",
    "    '--top-k': TOP_K,\n",
    "    '--data-size': MOVIELENS_DATA_SIZE\n",
    "}\n",
    "\n",
    "est = Estimator(source_directory=SCRIPT_DIR,\n",
    "                script_params=script_params,\n",
    "                compute_target=compute_target,\n",
    "                entry_script='train.py',\n",
    "                conda_packages=['pandas'],\n",
    "                pip_packages=['sklearn', 'tqdm'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Submit the job to the cluster\n",
    "An [experiment](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py#experiment) is a logical container in an AzureML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments. We access an experiment from our AzureML workspace by name, which will be created if it doesn't exist.\n",
    "\n",
    "Then, run the experiment by submitting the estimator object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create experiment\n",
    "EXPERIMENT_NAME = 'movielens-sar'\n",
    "exp = Experiment(workspace=ws, name=EXPERIMENT_NAME)\n",
    "\n",
    "run = exp.submit(config=est)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### 3.  Monitor remote run\n",
    "\n",
    "#### Jupyter widget\n",
    "\n",
    "Jupyter widget can watch the progress of the run.  Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "145080f4fc0e47c8a892ff7db3f3c08b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "RunDetails(run).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Viewing run results\n",
    "Azure Machine Learning stores all the details about the run in the Azure cloud. Let's access those details by retrieving a link to the run using the default run output. Clicking on the resulting link will take you to an interactive page."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table style=\"width:100%\"><tr><th>Experiment</th><th>Id</th><th>Type</th><th>Status</th><th>Details Page</th><th>Docs Page</th></tr><tr><td>movielens-sar</td><td>movielens-sar_1575027796_199dd2c6</td><td>azureml.scriptrun</td><td>Completed</td><td><a href=\"https://mlworkspace.azure.ai/portal/subscriptions/0ca618d2-22a8-413a-96d0-0f1b531129c3/resourceGroups/recommenders_project_resources/providers/Microsoft.MachineLearningServices/workspaces/reco_azureml/experiments/movielens-sar/runs/movielens-sar_1575027796_199dd2c6\" target=\"_blank\" rel=\"noopener\">Link to Azure Portal</a></td><td><a href=\"https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.script_run.ScriptRun?view=azure-ml-py\" target=\"_blank\" rel=\"noopener\">Link to Documentation</a></td></tr></table>"
      ],
      "text/plain": [
       "Run(Experiment: movielens-sar,\n",
       "Id: movielens-sar_1575027796_199dd2c6,\n",
       "Type: azureml.scriptrun,\n",
       "Status: Completed)"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "run"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Above cell should output similar table as below.\n",
    "![Experiment submit output](https://recodatasets.z20.web.core.windows.net/images/aml_sar_output.jpg)\n",
    "After clicking \"Link to Azure Portal\", experiment run details tab looks like this with logged metrics.\n",
    "![Azure Portal Experiment](https://recodatasets.z20.web.core.windows.net/images/aml_sar_workspace.jpg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'top-k': 10,\n",
       " 'data-size': '100k',\n",
       " 'Training time': 0.4077951559999633,\n",
       " 'Prediction time': 0.13354294300000902,\n",
       " 'map': 0.11059057578638949,\n",
       " 'ndcg': 0.3824612290501957,\n",
       " 'precision': 0.33075291622481445,\n",
       " 'recall': 0.1763854474342893}"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# run below after run is complete, otherwise metrics is empty\n",
    "metrics = run.get_metrics()\n",
    "metrics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deprovision compute resource\n",
    "To avoid unnecessary charges, if you created compute target that doesn't scale down to 0, make sure the compute target is deprovisioned after use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "# delete () is used to deprovision and delete the AzureML Compute target. \n",
    "# do not run below before experiment completes\n",
    "\n",
    "# compute_target.delete()\n",
    "\n",
    "# deletion will take a few minutes. You can check progress in Azure Portal / Computing tab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "# clean up temporary directory\n",
    "tmp_dir.cleanup()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python (reco_base)",
   "language": "python",
   "name": "reco_base"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}